智能论文笔记

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Jiawei Yang , Hanbo Chen , Yuan Liang , Junzhou Huang , Lei He , Jianhua Yao

分类：计算机视觉

2022-07-14

在计算病理学工作流程中检测和分裂ObjectSwithinWholesLideImagesis。自我监督学习（SSL）吸引了这种重度注释的任务。尽管自然图像的密集任务具有广泛的基准，但不幸的是，在当前的病理学作品中，此类研究仍然没有。我们的论文打算缩小这一差距。我们首先基于病理图像中密集预测任务的代表性SSL方法。然后，我们提出了概念对比学习（结论），这是密集预训练的SSL框架。我们探讨了结论如何使用不同来源提供的概念，并最终提出了一种简单的无依赖性概念生成方法，该方法不依赖于外部分割算法或显着检测模型。广泛的实验表明，在不同环境中，结论比以前的最新SSL方法具有优势。沿着我们的探索，我们弥补了几个重要而有趣的组成部分，这有助于致力于病理图像的密集预训练。我们希望这项工作可以提供有用的数据点，并鼓励社区为感兴趣的问题进行结论预培训。代码可用。

translated by 谷歌翻译

ReMix: A General and Efficient Framework for Multiple Instance Learning based Whole Slide Image Classification

Jiawei Yang , Hanbo Chen , Yu Zhao , Fan Yang , Yao Zhang , Lei He , Jianhua Yao

分类：计算机视觉

2022-07-05

整个幻灯片图像（WSI）分类通常依赖于深度监督的多个实例学习（MIL）方法来处理Gigapixel分辨率图像和幻灯片级标签。然而，深度学习的不错的表现来自利用大量数据集和不同的样本，敦促需要有效的培训管道来扩展到大型数据集和数据增强技术以进行多元化样品。但是，当前基于MIL的WSI分类管道是内存量的且计算的，因为它们通常组装成千上万的补丁作为计算袋。另一方面，尽管它们在其他任务中很受欢迎，但对于WSI MIL Frameworks来说，数据增强尚未探索。为了解决它们，我们提出了Remix，这是基于MIL WSI分类的一般有效框架。它包括两个步骤：减少和混合。首先，它通过用实例原型（即贴片群质心）代替实例，从而减少了WSI袋中的实例数量。然后，我们提出了一个``混合式''增强，其中包含四个在线，随机和灵活的潜在空间扩展。它带来了潜在空间的多样化和可靠的班级身份的语义变化，同时实施语义扰动不变性。我们通过两种最先进的MIL方法在两个公共数据集上评估混音。在我们的实验中，已经实现了精确度，准确性和召回率的一致提高，但随着训练时间和记忆消耗的减少阶段，它表明了混音的有效性和效率。代码可用。

translated by 谷歌翻译

Seeking Common Ground While Reserving Differences: Multiple Anatomy Collaborative Framework for Undersampled MRI Reconstruction

Yan Jiangpeng , Yu Chenghui , Chen Hanbo , Xu Zhe , Huang Junzhou , Li Xiu , Yao Jianhua

分类：计算机视觉

2022-06-15

最近，深度神经网络具有极大的高级无效磁共振图像（MRI）重建，其中大多数研究都遵循单个解剖学中的一个网络时尚，即每个专家网络都经过训练和评估特定解剖结构。除了培训多个独立模型的效率低下之外，此类公约还忽略了各种解剖学的共享脱张知识，这些知识可以彼此受益。为了探索共享知识，一种天真的方法是将来自各种解剖学的所有数据结合起来，以训练全能网络。不幸的是，尽管存在共同的脱氧知识，但我们透露，不同解剖学的独家知识可能会恶化特定的重建目标，从而导致整体绩效降低。在这项研究中观察到这一点，我们提出了一个新型的深MRI重建框架，并具有解剖结构和解剖学特异性的参数化学习者，旨在“寻求共同点，同时解决不同的解剖学差异”。尤其是主要的解剖学共享的学习者是暴露于不同的解剖学上，以模拟蓬勃发展的共同知识，而有效的解剖学特异性学习者则接受了目标解剖结构的培训，以进行独家知识。在两个MRI重建网络中，在我们的框架顶部介绍并探索了四个不同的解剖学学习者实现。关于大脑，膝盖和心脏MRI数据集的全面实验表明，其中三个学习者能够通过多种解剖学协作学习来增强重建性能。

translated by 谷歌翻译

VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

Hanbo Cai , Pengcheng Zhang , Hai Dong , Yan Xiao , Shunhui Ji

分类：人工智能 | 机器学习

2022-12-20

Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulating third-party training data. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Voiceprint Selection and Voice Conversion, abbreviated as VSVC. Experimental results demonstrated that VSVC is feasible to achieve an average attack success rate close to 97% in four victim models when poisoning less than 1% of the training data.

translated by 谷歌翻译

Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Jun Wang , Patrick Ng , Alexander Hanbo Li , Jiarong Jiang , Zhiguo Wang , Ramesh Nallapati , Bing Xiang , Sudipta Sengupta

分类：自然语言处理

2022-09-28

关于文本到SQL语义解析的最新研究取决于解析器本身或基于简单的启发式方法来理解自然语言查询（NLQ）。合成SQL查询时，没有可用的NLQ的明确语义信息，从而导致不良的概括性能。此外，如果没有词汇级的细粒度查询理解，查询与数据库之间的链接只能依赖模糊的字符串匹配，这会导致实际应用中的次优性能。考虑到这一点，在本文中，我们提出了一个基于令牌级的细粒度查询理解的通用，模块化的神经语义解析框架。我们的框架由三个模块组成：命名实体识别器（NER），神经实体接头（NEL）和神经语义解析器（NSP）。通过共同建模查询和数据库，NER模型可以分析用户意图并确定查询中的实体。 NEL模型将类型的实体链接到数据库中的模式和单元格值。解析器模型利用可用的语义信息并链接结果并根据动态生成的语法合成树结构的SQL查询。新发布的语义解析数据集的Squall实验表明，我们可以在WikiableQuestions（WTQ）测试集上实现56.8％的执行精度，这使最先进的模型的表现优于2.7％。

translated by 谷歌翻译

Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

Alexander Hanbo Li , Patrick Ng , Peng Xu , Henghui Zhu , Zhiguo Wang , Bing Xiang

分类：自然语言处理

2021-08-05

目前用于开放域问题的最先进的生成模型（ODQA）专注于从非结构化文本信息生成直接答案。但是，大量的世界知识存储在结构化数据库中，并且需要使用SQL等查询语言访问。此外，查询语言可以回答需要复杂推理的问题，以及提供完全的解释性。在本文中，我们提出了一个混合框架，将文本和表格证据占据了输入，并根据哪种形式更好地回答这个问题而生成直接答案或SQL查询。然后可以在关联的数据库上执行生成的SQL查询以获得最终答案。据我们所知，这是第一种将Text2SQL与ODQA任务应用于ODQA任务的论文。凭经验，我们证明，在几个ODQA数据集上，混合方法始终如一地优于仅采用大边缘的均匀输入的基线模型。具体地，我们使用T5基础模型实现OpenSquad数据集的最先进的性能。在一个详细的分析中，我们证明能够生成结构的SQL查询可以始终带来增益，特别是对于那些需要复杂推理的问题。

translated by 谷歌翻译

REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter

Hanbo Zhang , Deyu Yang , Han Wang , Binglei Zhao , Xuguang Lan , Jishiyu Ding , Nanning Zheng

分类：机器人 | 计算机视觉

2021-04-29

尽管在机器人抓住方面取得了令人印象深刻的进展，但机器人在复杂的任务中不熟练（例如，在杂乱中搜索并掌握指定的目标）。这些任务不仅涉及抓住，而是对世界的全面感知（例如，对象关系）。最近，令人鼓舞的结果表明，可以通过学习来理解高级概念。然而，这种算法通常是数据密集型的，并且缺乏数据严重限制了它们的性能。在本文中，我们提出了一个名为Reactad的新数据集，用于学习物体和掌握之间的关系。我们收集对象姿势，分段，掌握和目标驱动的关系掌握任务的关系。我们的数据集以2D图像和3D点云的两种形式收集。此外，由于所有数据都会自动生成，因此可以自由地导入数据生成的新对象。我们还发布了一个真实的验证数据集，以评估模型的SIM-to-Real性能，这些模型正在接受重新研磨的模型。最后，我们进行了一系列的实验，表明，根据关系和掌握检测，培训的模型可以概括到现实场景。我们的数据集和代码可以在：https://github.com/poisonwine/gerad

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译